Automatic Authorship Attribution

نویسندگان

  • Efstathios Stamatatos
  • Nikos Fakotakis
  • George K. Kokkinakis
چکیده

In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markers relevant to the output of this tool we also use analysis-dependent style markers, that is, measures that represent the way in which the text has been processed. No word frequency counts, nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple regression. All the experiments we present were performed using real-world text downloaded from the World Wide Web. Our approach is easily trainable and fully-automated requiring no manual text preprocessing nor sampling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis

Subjectivity analysis and authorship attribution are very popular areas of research. However, work in these two areas has been done separately. Our conjecture is that by combining information about subjectivity in texts and authorship, the performance of both tasks can be improved. In the paper a personalized approach to opinion mining is presented, in which the notions of personal sense and id...

متن کامل

Domain Independent Authorship Attribution without Domain Adaptation

Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...

متن کامل

Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution

RESUMEN For automatic classification, the implications of having too many classificatory features are twofold. On the one hand, some features may not be helpful in discriminating classes and should be removed from the classification. On the other hand, redundant features may produce negative effects as their number grows therefore their detrimental impact must be minimized or limited. In text c...

متن کامل

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of docume...

متن کامل

Experiments on authorship attribution by intertextual distance in English

How can it be said that texts are "near" or "distant" from one another? Are different texts by a single author more similar than texts by different authors? To answer these questions, a method is proposed by combination of the calculus of intertextual distance with automatic clustering and tree-classification. A blind test and some additional experiments show that this method offers an interest...

متن کامل

A Survey on Authorship Analysis

The paper discusses about the problem of Authorship analysis, different types of authorship analysis’s such as authorship attribution, authorship identification, authorship profiling, plagiarism detection. It also addresses the issues in Indian language text. Keywords— Authorship attribution, authorship profiling, plagiarism detection, text classification.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999